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1 METHODS FOR OBJECTIVE MEASUREMENT OF VIDEO 

2 QUALITY 

3 BACKGROUND OF THE INVENTION 

4 1. Field of the Invention 

5 This invention relates to methods for objective measurement of video quality and an 

6 optimization method that finds the best linear combination of various parameters. 

p 7 2. Description of the Related Art 

Hi 

8 Traditionally, the evaluation of video quality is performed by a number of evaluators who 

I 9 evaluate the quality of video subjectively. The evaluation can be done with or without reference 

J; 10 videos. In referenced evaluation, evaluators are shown two videos: the original (reference) 

yf 1 1 video and the processed video that is to be compared with the original video. By comparing the 

CM 2 two videos, the evaluators give subjective scores to the videos. Therefore, it is often called a 

13 subjective test of video quality. Although the subjective test is considered to be the most 

1 4 accurate method since it reflects human perception, it has several limitations. First of all, it 

1 5 requires a number of evaluators. Thus, it is time-consuming and expensive. Furthermore, it 
1 6 cannot be done in real time. As a result, there has been a great interest in developing objective 

17 methods for video quality measurement. Typically, the effectiveness of an objective test is 

1 8 measured in terms of correlation with the subjective test scores. In other words, the objective 

1 9 test, which provides test scores that most closely match the subjective scores, is considered to 

20 be the best 

21 In the present invention, new methods for objective measurement of video quality are provided 

22 using the wavelet transform. In particular, the characteristic of the human visual system whose 

23 sensitivity varies in spatio-temporal frequencies is taken into account. In order to compute the 
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spatio-temporal frequencies, the wavelet transform is used. In order to take into account the 
temporal frequencies, a modified 3-D wavelet transform is provided. The differences in the 
spatio-temporal frequencies are calculated by summing the difference (squared error) of the 
wavelet coefficients in each subband. Then, the differences in the spatio-temporal frequencies 
are represented as a vector. Each component of this average vector represents a difference in a 
certain spatio-temporal frequency band. From this vector, a number is computed as a weighted 
sum of the elements of the vector and that number is used as an objective quality measurement. 
In order to find the optimal weight vector, an optimization procedure is provided. The 
procedure is optimal in the sense that it provides gives the largest correlation with the subjective 
scores. 
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SUMMARY OF THE INVENTION 

Due to the limitations of the subjective test, there is an urgent need for a method for objective 
measurement of video quality. In the present invention, new methods for objective 
measurement of video quality using the wavelet transform are provided. The wavelet transform 
can exploit the characteristics of the human visual system, which varies in spatio-temporal 
frequencies. The wavelet transform analysis produces a number of parameters, which can be 
used to produce an objective score. In the present invention, the parameters are represented as a 
parameter vector, from which a number is computed. Then, the number is used as an objective 
score. In order to find the best linear combination of the parameters, an optimization procedure 
is provided. 

Therefore, it is an object of the present invention to provide new methods for objective 
measurement of video quality utilizing the wavelet transform. 

It is another object of the present invention to provide an optimization procedure that finds the 
best linear combination of various parameters that are obtained for objective measurement of 
video quality. 

The other objects, features and advantages of the present invention will be apparent from the 
following detailed description. 
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BRIEF DESCRIPTION OF THE DRAWING 



Fig. la shows an original image. 

Fig. 3b shows an example of a 3-level wavelet transform of the original image of Fig la. 

Fig. 2 illustrates the subband block index of a 3-level wavelet transform. 

Fig. 3 illustrates how the squared error in the z-th block is computed. 

Fig. 4a illustrates how the modified 3-dimensional wavelet transform is computed. 

Fig. 4b illustrates how a new difference vector is computed. 
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DESCRIPTION OF THE ILLUSTRATED EMBODIMENTS 

Embodiment 1 

The present invention for objective video quality measurement is a full reference method. In 
other words, it is assumed that a reference video is provided. In general, videos can be 
understood as a sequence of frames. One of the simplest ways to measure the quality of a 
processed video is to compute the mean squared error between the reference and processed 
videos as follows: 

e .nse= T ^Y J Y^m,m,n)-V(l,m,n)f 

l m n 

where U represents the reference video and l^the processed video. Mis the number of pixels in 
a row, N the number of pixels in a column, and L the number of the frames. However, the 
sensitivity of the human visual system varies in different frequencies. In other words, the 
human eye may perceive the differences in various frequency components differently and this 
characteristic of the human visual system can be exploited to develop an objective measurement 
method for video quality. Instead of computing the mean square error between the reference 
and processed videos, a weighted difference of various frequency components between the 
reference and processed videos is used in the present invention. There are mainly two types of 
frequency components for video signals: spatial frequency components and temporal frequency 
components. High spatial frequencies indicate sudden changes in pixel values within a frame. 
High temporal frequencies indicate rapid movements along a sequence of frames. In the case of 
color videos, there are three color components and frequency components can be computed for 
each color. A number of techniques have been used to compute the frequency component and 
some of the most widely used methods include the Fourier transform and wavelet transform. In 
the present invention, the wavelet transform is used. However, it is noted that one may use the 
Fourier transform and still benefit from the teaching of the present invention. 

Fig. la shows an example of a 3 level wavelet transform of the original image of Fig. la. In a 3 
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level wavelet transform, there are 10 blocks, as can be seen in Fig. 2. Each block represents 
various spatial frequency components. The block 120 in the upper left-hand corner represents 
the lowest spatial frequency component of the frame and the block 121 in the lower right-hand 
block the highest spatial frequency component. In a 2 level wavelet transform, there are 7 
blocks. On the other hand, in a 4 level wavelet transform, there are 13 blocks. 

In order to compute spatial frequency components, the wavelet transform is applied to each 
frame of source and processed videos. Then, the difference (squared error) of the wavelet 
coefficients in each block is computed and summed, as illustrated in Fig. 3. In other words, the 
difference in the i-th block is computed as follows: 

jGi th block 

where c ref i j is a wavelet coefficient of the i-th block of the reference video and c t is a 
wavelet coefficient of the corresponding processed video. This will produce 10 values that can 
be represented as a vector, assuming that a 3 -level wavelet transform is applied. Each element 
of the vector represents the difference of the corresponding subband block. Repeating this 
procedure over the entire frames produces a sequence of vectors. In other words, the difference 
vector of the l-th frame is represented as follows: 

A = • (2) 

A*. 

where d l x = ^(c refJylfJ - c proc lJ j ) 2 is the sum of the squared errors in the i-th block, c ref l i j 

jei-th block 

is a wavelet coefficient of the i-th block of the l-th frame of the reference video, K is the 
number of blocks in the 2-D wavelet transform, and c proc l i j is a wavelet coefficient of the i-th 

block of the l-th frame of the processed video. It is noted that there are many other ways to 
compute the difference such as absolute differences. 
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1 Finally, the average of these vectors over the entire frames is computed as follows: 



2 



D = 




(3) 



3 In the present invention, a number is computed as a weighted sum of the elements of the 

4 average vector and the number will be used as an objective measurement of the processed 

5 video. In other words, this new number is computed as follows: 



7 where W = [w v w 29 ... 9 w K f is a weight vector, D = [d v d 2 ,...,d K ] T and K is the size of the 

8 vector. 

9 Embodiment 2 

1 0 The difference in the z-th block of equation (1) is computed by summing the difference of the 

1 1 wavelet coefficients for each pixel. However, the human eye may not notice the difference 

1 2 between pixels whose difference is smaller than a threshold. Thus, the difference in the z-th 

1 3 block may be computed to take into account these characteristics of the human visual system as 

1 4 follows: 



6 



y = W T D 



15 




1 6 where t 0 is the threshold. 



17 



Embodiment 3 



18 
19 
20 



The difference vector of equation (3) represents only spatial frequency differences. In order to 
take into account the temporal frequency differences, a 3-D wavelet transform can be applied. 
However, applying a 3-D wavelet transform to a video is a very expensive operation. It 
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1 requires a large amount of memory and takes a long processing time. In the present invention, a 

2 modified 3-D wavelet transform is provided to take into account the temporal frequency 

3 characteristics of videos. However, it is noted that one may use the conventional 3-D wavelet 

4 transform and still benefits from the teaching of the present invention. 

5 After computing the difference vector of equation (2) over the entire frames, a sequence of 

6 difference vectors is obtained. The sequence of difference vectors can be arranged as a 2- 

7 dimensional array with a difference vector as a column of the 2-dimensional array (Fig. 4a). 

8 Then, each row of the 2-dimensional array shows how the difference of each subband block 

9 varies temporally. In order to compute temporal frequency characteristics, a 1 -dimensional 



1 0 wavelet transform is applied to each row of the 2-dimensional array whose columns are the 

!;H 1 1 sequence of the difference vectors. 

IM. 1 2 First, a window 140 is applied to each row of the 2-dimensional array producing a segment of 

0 1 3 the row and the 1 -dimensional wavelet transform is applied to the segment in the temporal 

RJ-14 direction (Fig. 4a). Then, the squared sum of each subband of the 1-dimensional wavelet 

r| 1 5 transform of thej-th row of the l-th widow is computed as follows: 



1 7 where I represents the Z-th window, j the j-th row, and i the z-th subband. This procedure is 

1 8 illustrated in Fig. 4b. This operation is repeated for all rows and all the values are represented 

19 as a vector as follows: 



m 



16 



ke i th subband 



20 



E,= 
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1 assuming that the level of the 1 -dimensional wavelet transform is 3. After the summation, the 

2 size of the resulting vector is larger than that of the original vectors. For instance, if the level of 

3 the 1 -dimensional wavelet transform is 3 and the size of the original vectors is K, the size of the 

4 resulting vector will be 4K. Then, the window is moved by a predetermined amount and the 

5 procedure is repeated. After finishing the procedure over the entire sequence of vectors, a new 

6 sequence of vectors, whose size is larger than that of the original vectors, is obtained. This new 

7 sequence of vectors contains information on temporal frequency characteristics as well as 

8 spatial frequency characteristics. As previously, the average of these vectors is computed. In 

9 other words, an average vector is obtained as follows: 



S;to e = 



1 V 

u 1=1 



1 1 where L' is the number of vectors that contain information on temporal frequency 

1 2 characteristics as well as spatial frequency characteristics. Although the modified 3-dimensional 

fli 

III 1 3 wavelet transform is used to compute the spatio-temporal frequency characteristics in the above 

0 1 4 procedure, there are many other ways to compute differences in spatial and temporal 

ru 

1 5 frequencies. For instance, the conventional 3-dimensional wavelet transform or 3-D Fourier 

16 transform can be used to produce a number of parameters that represent spatio-temporal 

1 7 frequency components. These differences in spatial and temporal frequencies are represented as 

18 a vector and the optimization technique, which is described in the next embodiment, is applied 

19 to find the best linear combination of the differences, producing a number that will be used as 

20 an objective score. It is noted that there are many other transforms which can be used for 

21 computing spatial and temporal frequencies, including the Haar transform and the discrete 

22 cosine transform. 
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Embodiment 4 

Whether one uses the 2-dimensional wavelet transform or the modified 3-dimensional wavelet 
transform or the conventional 3-dimensional wavelet transform, a single vector eventually 
represents the difference between the source and the processed videos. From this vector, a 
number needs to be computed as a weighted sum of the elements of the vector so that the 
number will be used as an objective score. In other words, this new number is generated as 
follows: 

y = W T D (4) 

where the superscript T represents transpose, W = [w l9 w 29 ...,w K ] T , D = [d 1 ,d 29 ...,d K f and K 
is the size of the vector. 

Let x be the subjective score of the processed video such as DMOS (difference mean opinion 
score). Then, x and y can be considered as random variables. The goal is to make the 
correlation coefficient between x and y as high as possible by carefully choosing the weight 
vector W. It is noted that the absolute value of the correlation coefficient is important. In other 
words, two objective testing methods, whose correlation coefficients are 0.9 and -0.9, are 
considered to provide the same performance. 

The correlation coefficient between two random variables is defined as follows: 

Cov(x 9 y) 
^Var(x)Var(y) ' 

By substituting j = W T D, p becomes 

Cov(x,W T D) _ Cov(x,W T D) 
^Var(x)Var(W T D) ~ ^Var(x)W T l, D W 

_ E(xW T D)-m x E(W T D) 
^Var(x)W T X D W 

where 2 D is the co variance matrix of D of equation (4) and £(•) is the expectation operator. 
For random variables, the expectation is computed as follows: 
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1 E(x)= r xfJx)dx 

J— CO 

2 where f x (x) is the probability density function of x 
3 

4 Without loss of generality, it may be assumed that m x = 0 and Var(x) = 1, which can be done 

5 by normalization and translation. Such normalization and translation do not affect the 

6 correlation coefficient with other random variables. Then, the correlation coefficient is 

7 expressed by 

W T E(xD) _ W T Q 



8 p = 



~JVar(x)W T X D W ^W"L~W 



« 9 where Q = E(xD). 



y 

Hi 
IV 

s : 

5 

^2 



10 The goal is to find W that maximizes the correlation coefficient p. In order to simplify the 
I 1 1 equation, p 2 may be maximized instead of p since the optimal weight vector W will be the 
C3 1 2 same. Then, p 2 is given by 



J 2 = (W r Q)(W r 6) r = W = w r s g w 

in I 

1 4 where ~L Q = <2<2 T . Since the goal is to find W that maximizes p 2 , the gradient of p 2 should be 

1 5 computed. Now it is straightforward to compute the gradient of p 2 as follows: 

16 = -^[W%W(W%W)- 1 1 

dw dw l 3 D J 

1 7 = 2Z e W( W r S c W) _1 - 2Y, D W{W T i: Q W)(W T l. D Wy 2 = 0 

1 8 => E fi W - EoWCW^XgWXW^oW)- 1 = 0 

19 =>S fi W-E D Wp 2 =0 

20 => S G W = S^Wp 2 

21 =>2- 1 2 fi W = p 2 W. 
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As can be seen in the above equations, W is an eigenvector of E D * X e and p 2 is an eigenvalue 
of S" 1 E Q . Therefore, the eigenvectors of S' 1 S Q are first computed and the eigenvector 
corresponding to the largest eigenvalue A is used as the optimal weight vector W . Since 
A = p 2 , the correlation coefficient will be the largest when the eigenvector corresponding to the 
largest eigenvalue is used as the optimal weight vector W . 

It is noted that vector D in equation (4) can be any vector. For example, each element of vector 
D may represent any measurements of video quality and the proposed optimization procedure 
can be used to find the optimal weight vector W, which provides the largest correlation 
coefficient with the subjective scores. In other words, instead of using the wavelet transform to 
compute differences in the spatial and temporal frequency components, one can use any other 
measurements to measure video quality and then utilize the optimization method to find the best 
linear combination of various measurements. Then, the final objective score will provide the 
largest correlation coefficient with the subjective scores. 



